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Abstract. In this paper, we present an Uzawa-based heuristic that is adap- 
ted to some type of stochastic optimal control problems. More precisely, we 
consider dynamical systems that can be divided into small-scale independent 
subsystems, though linked through a static almost sure coupling constraint 
at each time step. This type of problem is common in production/portfolio 
management where subsystems are, for instance, power units, and one has to 
supply a stochastic power demand at each time step. We outline the framework 
of our approach and present promising numerical results on a simplified power 
management problem. 



1. Introduction 

Stochastic optimal control is concerned about finding strategies to manage dy- 
namical systems in an optimal way, with respect to some cost function. The par- 
ticularity of such optimization problems is that the optimization variables we deal 
with arc random variables. Indeed, the dynamical systems we consider are partially 
driven by some exogeneous noises and the objective function may also include such 
noises. Hence controls are random variables. Classical approaches such as Dynamic 
Programming and Stochastic Programming, that we briefly recall below, encounter 
difficulties when the system becomes large. The aim of this paper is to present a 
new heuristic to solve a class of such problems using price decomposition. 

The problems we are studying are common in practice. For example, consider a 
physical system, say a set of numerous power units, that evolve depending on exo- 
geneous noises (water inflows, failures) and on controls (production levels). At each 
time step, an observation on the system arises and a control has to be chosen on the 
basis of the available information, namely the past observations (non-anticipativity 
constraint). The objective is to minimize the sum of the units' production costs 
over a given discretized time horizon, while satisfying a global demand constraint 
at each time step. This decision process hence consists of finding optimal strategies, 
i.e. functions that map, at each time t, the available information to the optimal 
decision with respect to the production cost. 

As far as we know, most methods that have been proposed to decompose large- 
scale stochastic optimal control problems are based on Stochastic Programming 
(see [Pre95, SR03]). This approach consists in representing the non-anticipativity 
constraints using a so-called scenario tree. Once discretized on such a structure, 
the problem is not stochastic anymore and various deterministic decomposition 
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techniques have been used to solve it (see [HS96]). In this context, there are two 
main issues that are not easy to deal with. The first is concerned with the "distance" 
between the original problem and its deterministic reformulation [Art91, HRS06], 
or how to draw a scenario tree in such a way that the solution of the discrctized 
problem is an accurate estimate of the original one. In order to obtain some given 
accuracy, [Sha06] shows that the growth of the number of leaves in the tree, hence 
the numerical complexity, has to be exponential with respect to the time horizon. 
The second issue is concerned with the way one can rebuild strategies from optimal 
commands obtained in the discretized problem [Pen05] . 

On the other hand, when dealing with a Markov Decision Process, methods 
based on Dynamic Programming (DP) (see [Bel57, BerOO]) do provide a way to 
obtain strategies as feedback functions with respect to so-called state variables. 
Unfortunately, the well-known curse of dimensionality prevents us from using this 
approach straightforward on large-scale problems, because the computational bur- 
den increases exponcntionally with the state dimension. Numerous approxima- 
tions have been proposed to tackle the difficulty. For instance, a popular idea in 
the field of hydro-power management, introduced by Turgcon in [Tur80], consists 
of obtaining local strategies as a function of the local stock and the aggregated 
complementary stock. Another idea, namely Approximate Dynamic Programming 
(ADP), looks for the value functions (solutions of the DP equation) within a finite- 
dimensional space (see [BD59] or [BT96, §6.5]). To be practically efficient, such an 
approach requires some a priori information about the problem, in order to define 
a well suited functional subspace. Indeed, there is no systematic means to choose 
the basis functions and several choices have been proposed [TVR96, dFVR03]. 

When dealing with large-scale optimization problems, the decomposition/coordi- 
nation approach aims at finding a solution to the original problem by itcrativcly 
solving several smaller-dimensional subproblcms. In the deterministic case, several 
types of decomposition have been proposed (e.g. by prices or by quantities) and 
unified in a general framework using the Auxiliary Problem Principle in [Coh80]. 
In the open-loop stochastic case, i.e. when controls do not rely on any observation, 
[CC90] proposed to take advantage of both decomposition techniques and stochas- 
tic gradient algorithms. These techniques have been extended in the closed- loop 
stochastic case by [BRS07], but so far they fail to provide decomposed state de- 
pendent strategies in the Markovian case. This is because a subproblem's optimal 
strategy depends on the state of the whole system, not only on the local state. In 
other words, decomposition approaches are meant to decompose the control space, 
namely the range of the strategy, but the numerical complexity of the problems we 
consider here also arises because of the dimensionality of the state space, that is to 
say the domain of the strategy. 

We here propose a way to use price decomposition within the closed-loop stochas- 
tic case. The coupling constraints, namely the constraints preventing the problem 
from being naturally decomposed, are dualized using a Lagrange multiplier (price). 
At each iteration, the price decomposition algorithm solves each subproblem using 
the current prices, then uses the solutions to update the prices. In the stochas- 
tic context, prices are a random process whose dynamics are not available, so the 
subproblcms do not in general fall into the Markovian setting. However, on a 
specific instance of this problem, [Str06] has exhibited a dynamics for the opti- 
mal multipliers, and he has shown that these dynamics were independent of the 
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decision variables. Hence it was possible to come down to the Markovian frame- 
work and to use DP to solve the subproblems in this case. Following this idea, we 
propose to choose a parameterized dynamics for these multipliers so that solving 
subproblems using DP becomes possible. The update is then performed using a 
sampling/ regression technique. 

This paper is organized as follows. In §2 we describe the general type of problems 
we are concerned with in this paper. Then, in §3, we recall the Dynamic Program- 
ming equation and highlight the difficulties induced when considering large-scale 
problems. In §4, we present the classical price decomposition approach in Hilbcrt 
spaces and the difficulties encountered when dealing with stochastic optimal con- 
trol problems. Based on these ingredients, we present in §5 a heuristic allowing us 
to solve subproblems using DP. We finally validate this approach on a simplified 
power management problem in §6. 



2. Mathematical framework 

Throughout this paper the random variables, defined over a probability space 
(CI, A, P), will be denoted using bold letters (e.g. W G L 2 (Cl,A,F,W)) whereas 
their realizations will be denoted using normal letters (e.g. w G W). 

In this paper we consider a finite horizon stochastic optimal control problem, 
where T denotes the time horizon. Three types of random variables are involved in 
the problem, namely a state, a control, and a noise. The state X t £ L 2 (CI, A, P, R") 
evolves with respect to dynamics depending on the control Ut G L 2 (CI, A, P, R m ) 
and on some exogeneous noise £ t G L 2 (CI, A, P, W). Unlike deterministic optimal 
control problems, in the stochastic case the time "direction" is of particular impor- 
tance. In order to fulfill the causality principle, the control at a given time step 
t only depends on the observation of noises prior to t. Moreover, we assume that 
the observation available at time t consists of all past noises. In order to math- 
ematically represent such an information structure, we denote by At the er-field 
generated at time t by past noises (£ 1; . . . , so that the control Ut at time step 
t has to be measurable with respect to At ■ These last constraints will be called the 
non-anticipativity constraints. 

The global system consists of N units, whose dynamics and cost functions are 
mutually independent. More precisely, the state X t (respectively the control Ut) of 
the global system writes [x\, . . .,Xf) with X\ G L 2 (Cl, A,¥,R n ') (respectively 
(U\,..., Uf ) with U\ G L 2 (Cl, A, P, R m ')) and n = Y^ =l n< and m = £ 4 =i m, so 
that the global dynamics Xt+i = ft (Xt,Ut,£ t +i) can be written independently 
unit by unit: X\ +1 = f\ (X\, U\, £ t+ i) , i = 1, . . . ,N. In the same way, the global 
cost L t (X t , Ut, £t+i) i s equal to the sum of the local unit costs L\ (X\, U\, £ t+1 ) , 
i = 1, . . . , N. At the end of the time period, each unit i leads a cost K l that only 
depends on the final state X l T of the unit. 

For now, the global problem can be stated independently unit by unit. The 
coupling between the units arises from a set of static Revalued constraints, the 
constraint at time step t reading YliLi 9t (-^"t> ^t) = ( see remark 2 for extensions 
to more enhanced relations). We suppose that all functions j\, L\ and g\ are at 
least Borel measurable. 
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The initial state Xq is assumed to be known. Denoting (Xi, . . . , Xt) by X and 
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There are three types of coupling in Problem (1): 

• The first one comes from the state dynamics (lb) that induce a temporal 
coupling, subsystem by subsystem. 

• The second one arises from the static constraints (Id) that link together all 
the subsystems at each time step t. 

• The third type of coupling comes from the non-anticipativity constraints 
(le), which link together controls relying on the same noise history. If two 
realizations of the noise process are identical up to time t, then the same 
control has to be applied at time t on both realizations. 

We ultimately suppose that noises £ t are independent (white noise). We are 
thus in the Markovian case and it is well known that the optimal control, which is 
a priori a function of all the past noises, only depends on the current state [BerOO]. 

Remark 1 (White noise assumption). If the noises £ t are not independent 1 but 
still have known dynamics, one can always include the necessary noise history in 
the state to come back to the Markovian case. Unfortunately, this usually leads 
to a higher state dimension, and hence a higher numerical complexity in the DP 
framework, as will be explained in §3. 

Remark 2 (Coupling constraints involving noises). It is possible to replace the static 
coupling constraint £^ g\ (X\, U\) = by £^1 g\ (X\, U\) = D t , where D t is 
a random variable representing for instance a global demand. However, expressions 
are then harder to write: in the Markovian case, i.e. when the Dt's are independent 
one from another, D t is observed before choosing the control at time t, so optimal 
controls must depend on both the state X t and the noise D t . 



3. Stochastic Dynamic Programming 

In order to solve stochastic optimal control problems in the Markovian frame- 
work, Bellman proposed in [Bel57] the Dynamic Programming (DP) method. It 
consists of introducing value functions Vt : R™ — > R that represent the expected 



and also the noises Dt introduced in remark 2 
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optimal cost when starting from state x at time t. In the case of Problem (1), it 
reads: 



'T-l N N 
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subject to the same constraints as in (1) and using the convention that if the 
optimization problem (2) is not feasible, then Vt (x) = +oo. The value functions 
are usually computed in a recursive manner using the DP equation: 
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Unlike Stochastic Programming methods, a major advantage of DP is that it 
provides the control TJ t as a feedback function on the state variable X t : 

U t = # t (Xt) • 

Except on very simple examples, Equation (3) cannot be solved analytically, and 
many numerical methods have been proposed. A common practice is to discrctize 
the state space and estimate the expectations using Monte Carlo sampling. Un- 
fortunately, as was mentioned in §1, we are facing the curse of dimensionality: the 
complexity of DP grows exponentially with respect to the state space dimension. 

Moreover, Equation (3) is not decomposable in the sense that it cannot be re- 
placed by the solving of N DP equations depending only on the local state x % . 
Indeed, even if Vt is a sum of functions depending on the local state x l as in (3a), 
this additive property docs not hold for the preceeding time steps because of the 
coupling constraint (3c). Hence, looking for the value function as a sum of func- 
tions depending only on the local state would lead to suboptimal strategics. In 
other words, the local state of a subsystem is not sufficient to take the optimal 
local decision; some global information about the system is necessary. 

Nonetheless, DP remains a seductive approach for small-scale problems since it 
provides a way to obtain feedback functions. Based on a decomposition scheme 
presented in §4, we will describe in §5 a heuristic approach in which Problem (1) is 
decomposed into small-scale sub-problems that wc solve using DP. 



4. Price decomposition 

Let us recall some results about the classical Uzawa algorithm [AHU58], which 
aims at iteratively getting round the static coupling constraint (Id). When the cost 
function is additive, this algorithm is also referred to as the price decomposition 
approach (see [Coh80] for further details). Let us first introduce the Lagrangian of 
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problem (1): 

(T-l N N \ 

E E ( L * ( X *> ^ + X t9l ( x i u\)) + E ^ (x*,) , 
4=0 i=l i=l / 

with At £ i 2 (il, „4, P, R d ) the Lagrange multiplier associated to the coupling con- 
straint (Id) and A = (Ao, . . . , Ay-i)- When the Lagrangian has a saddle point, 
we know from classical duality theory in optimization [ET92] that Problem (1) is 
equivalent to: 

(4a) max min C(X,U, A) 

A X.U 

(4b) s.t. x\ +1 =fi(xi,ui,z t+1 ), yt = 0,...,T-l,\fi = l,...,N, 

(4c) U t is A-measurable, Vi = 0, . . . , T — 1, 

(4d) x t <Xt<x t , Vt = l,...,T, 

(4e) u t <Ut<u tl Vt = 0,...,T-l. 

Recall that the Lagrange multiplier A t can be interpreted as the marginal price 
one should pay for satisfying the coupling constraint (Id). Because of the At- 
measurability of the variables involved in this constraint and of the properties of 
conditional expectation, it is easy to see that we can always choose At to be ^It- 
measurable. 

Let us introduce the dual function ip (A) := minx,c/ £ (X, U, A) subject to con- 
straints (4b), (4c), (4d) and (4e). The key point of the price decomposition algo- 
rithm is that computing ip (A) is much easier than solving the original Problem (1). 
Indeed, one can write: 

/T-l N AT \ 

^ (A) = minE E E { L * (*t. fUt+i) + ^9*t (K U\)) + E K% ( X 't) > 

\t=0 i=l i=l / 
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= E E E ( L t ( x i> u i> + x l9i (xi, ud) + k* (x i T ) , 



x\u z 

L—l \ Z—U / 

so that we replace the solving of an optimization problem with variables (X , U) by 
the solving of TV subproblems with variables (X 1 ', 17 j. 

Given X k , an iteration of the price decomposition algorithm first solves the N 
subproblems: 

(5a) min E MT (l\ (X\, U\,i t+1 ) + xf g\ [X\, U\)) + K* (X* T ) ] 



,t=o 



(5b) s.t. X\ +1 = fi(X\,UU t+x ), Vfc = 0,...,T-l, 
(5c) U\ is A-measurable, V< = 0, . . . , T - 1, 

(5d) 4<X\<x\, Vt=l,...,T, 

(5c) y? t <U\<u\, Vi = 0,...,T-l. 
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The Lagrange multiplier A fe is then updated using a gradient-like algorithm. Under 
standard assumptions 2 , the gradient of ip is: 

N 

i=i 

where x]' k+1 and are the solutions of Problem (5). 

At first sight. Problem (5) looks like a stochastic optimal control problem with 
control U\ and state X l t , the solution of which would be a local feedback on X\. 
This contradicts the fact that the solution of Problem (1) is a feedback function 
on the whole state (X\ , . . . , X^) . In order to understand where this contradiction 
comes from, one has to highlight the role of A in Problem (5) . 

5. Dual Approximate Dynamic Programming 

Let us take a closer look at Problem (5). First suppose that A is a white noise 
process. Then Problem (5) lies in the Markovian framework with state X\ and 
noise (£ t , A t ). The optimal control U\ depends only on the local state X\ and one 
can apply stochastic dynamic programming to solve this small-scale optimal control 
problem. Unfortunately, we do not know anything about the time correlations of 
the price process A. . . 

Let us now consider the general case. Defining {X\, Ai, . . . , A t ) as the state at 
time t, Problem (5) falls in the Markovian setting. In particular, the optimal control 
U\ is \X\, Ai, . . . , A t ) -measurable. However DP in this context proves numerically 
intractable because the state dimension increases with respect to time. 

Consider now an intermediate case, and suppose that the dual variable A has a 
short memory dynamics, for instance that Xt+i only depends on A t and £ t+1 : 

(6) Af+i = ht (At,£ t+ i) ■ 

Using (X\, At) as the state variable at time t, Problem (5) falls in the Markovian 
setting. The state dimension does not increase with respect to time anymore and 
is hopefully small so that Problem (5) can be solved using DP. 
In a very specific instance of Problem (1), namely: 

(T-l N N 
t=0 i=l i=l 

X\ +r = X\-U\ + A\ +1 , :i (I T 1. 

n 

Y,U\=D U Vt = 0,...,T-l, 

»=i 

Ut is .At-measurable, Vt = 0,...,T— 1, 

with £ t = [D t , A^, . . . ,A t ), [Str06] has brought to light such an intermediate 
case. Here the dimension of the state X\ (respectively of the control U\) in the 
subsystem i is rn = 1 (respectively m, = 1), for i = 1, . . . , N. The result is the 
following. 



s.t. 

(7) 



Sec [Dan67] for results on the differentiability of the dual fonction i/i. 
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Proposition 1. If (Dt, A], . . . , A^ t Q T is a white noise process and if there 
exists a € R + such that 7; = aci, Vi = 1, . . . , N, then the optimal Lagrange multi- 
pliers satisfy: 

Xt+i = A t + 1 ± (D t+1 (l + a)-D t - aE {D t+1 ) 
-a(A t+1 -E{A t +i))), 

/ T T-1 \ 

Ao = — jf , fl (l-tt)-^E(A,)- a ^E(C,) . 

a \ s=l s=l J 

Using such a dynamics for the multipliers, it is straightforward to show that 
Problem (7) splits into N independent optimization subproblems. Taking the state 
variable as [X\, A t , Dt), the i-th subproblem can be solved using DP in dimension 
3. In summary, we have replaced one TV-dimensional problem by iV 3-dimcnsional 
problems. 

Note that the proportionality assumption on the cost coefficients in proposition 1 
is rather unnatural. Nevertheless, it shows that, in some cases, there exist dynamics 
for the Lagrange multipliers that is independent of the decision variables. 

To deal with more general cases, we propose to approximate the dual process 
A by some parameterized short-memory process. That is, we try to identify the 
multipliers that are the closest to the optimal ones within a constrained subspace of 
stochastic processes. This approach is similar to that employed in the Approximate 
Dynamic Programming (ADP) method. Since it concerns dual variables rather 
than DP value functions, we refer to this approach as Dual Approximate Dynamic 
Programming. 

The performance of such an approach highly depends on the choice of the sub- 
space of stochastic processes in which we force the multipliers to lie. However, a 
major difference with ADP techniques is that approximating the dual variables may 
lead to violations of the coupling constraints. The larger the chosen subspace, the 
less the coupling constraints will be violated. Moreover, prior information on the 
problem may be useful to devise a suitable dynamics. 

Let us now present the implementation of DADP. We constrain dual variables 
to satisfy: 

(8) A t+ i = h at (A t ,£ t+1 ) , 

where h at is an a priori chosen function parameterized by a t £ R 9 . We denote 
by S the set of all random processes that verify Equation (8) for some real vector 
a = (ai, . . . , a-r-i). Given a vector a k of coefficients, the first step of DADP is to 
solve the N subproblems (5) using DP with state (XJ, At). In order to update the 
Lagrange multipliers, we draw s trajectory samples of the noise £ and integrate the 
dynamics (5b) and (8) using the optimal feedback laws, thus obtaining s trajectory 
samples of X k , U k and A . We then perform a gradient step on A sample by 
sample: 

N 

xk + ha = x ^ + pt ^ gl (xr-r/;-^) , va = i, . . . , s, 

i=l 
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with pt being well-chosen real values. Finally, we apply a regression operator 1Z B 
on the samples \ k+1 in order to obtain a stochastic process A fc+1 lying in S: 

K°(\ k ^) = argmin £ £ Ik - ' 

S.t. = /l Qf (/^fj^t+l) ■ 

This heuristic is outlined in algorithm 1. 



Algorithm 1 Dual Approximate Dynamic Programming 

Require: e > 0, 7 > 0, a shape h a for the prices dynamics, a . 
repeat 
fc <- fc + 1 
for i = 1 to A*" do 

Solve z-th subproblem by DP using parameters a k for the price dynamics, 
and obtain X l,k and U 1 ' . Both implicitly depend on a k . 
end for 

Update paramaters a k : 

N 



t=0,...,T-Xj 



where: 



until llA^ 1 - A fc || < e 



x t+i = Kf ( A t>£t+i ) . Vi = 0,...,T- 1. 



Remark 3 (Convexity of S). The regression operator 1Z S is meant to be a sample- 
based approximation of the projection operator 1Z on S. Since the latter set can 
be non-convex, 7£(A fe+5 ) is not necessarily unique: this may lead to numerical 
instabilities. 

Remark 4 (Enhancement of S). It may be desirable to consider a larger set S in 
order to estimate more accurately the price process. For instance, one can extend 
relation (8) in order to include more memory in the process: 

At+i = h at ^(A T ) r < 4 , (£ T ) T < t+1 ) , 

However, this will in general increase the numerical complexity of DP in the solving 
of the subproblems. 

Remark 5 (Another formulation) . Alternatively, wc could have considered a gradi- 
ent algorithm that iterates directly on the parameters a of the dynamics (8). In this 
case, since we have no restrictions on a, the feasible set would have been convex. 
Unfortunately, because the dynamics (8) may be nonlinear with respect to a, the 
dual function ip introduced in §4 might be non-concave with respect to a. 
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6. Numerical experiments 

Wc tested this approach on a simple power management problem. On this small- 
scale example, we will be able to compare DADP results to those obtained by DP. 
Consider a power producer who owns two types of power plants: 

• Two hydraulic plants that are characterized at each time step t by their 
water stock X\ and power production U\, and receive water inflows 

i = 1,2. These two units are subject to dynamic constraints but are cost- 
free; 

• One thermal unit with a production cost that is quadratic with respect to 
its production . There are no dynamics associated with this unit. 

Using these plants, the power producer must supply a power demand D t at each 
time step t, over a discrete time horizon of T = 25 time steps. All noises, i.e. the 
demand Dt and the inflows and ^ are chosen to be white noise processes. 




In the model we moreover impose small quadratic costs on the hydraulic power 
productions in order to ensure that, at least in the deterministic framework and 
without our approximation, the algorithm would build primal iterates that converge 
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to the optimal solution of the original problem (see [ET92]). The problem reads 3 : 
(9a) mm E (e {U\) 2 + e {U 2 t ) 2 + L t (C/ 3 )] + K 1 (Xy) + K 2 (X 2 




(9b) s.t. Xl +1 =Xl-Ul+e t +i, Vi=l,2, Vf = 0,...,T-l, 
(9c) U\ + U\ + Z7 3 = D t , Vt = 0,... ) T-l, 

(9d) x< < X\ < x\ Vi = 1, 2, Vf = 1, • • • , T, 

(9c) 0<U l t <u\ Vi = l,2, Vi = 0,...,T-l, 

(9£) 0<£/ 3 , vt = o,...,r-i, 

(9g) E7j is <7{D ,^, A, ^.^J-mcasurable, Vz = 1,2,3. 

In this problem, the state X t is two-dimensional, hence DP remains numerically 
tractable and we can use the DP solution as a reference. In order to use DADP, we 
choose an auto-regressive process for the Lagrange multipliers: 

(10a) A t+ i = a t \ t + P t D t+ i + 7t, 

(10b) A =&£><> +7o- 

We then perform the algorithm and depict its convergence in Figure 2. We first 
draw the values of the dual function ip introduced in §4 along with iterations (lower 
curve) and observe that it converges to the optimal value of the original problem 
computed by DP. Note that each value of ip is computed by Monte Carlo simulation 
over 10 3 scenarios. We also draw the cost of the problem with all constraints 
satisfied (primal cost) at each iteration (upper curve). As explained in §5, DADP 
does not ensure that the coupling constraint (9c) is satisfied. To circumvent this 
difficulty, the thermal unit strategy is chosen in the simulation so as to ensure 
feasibility of the coupling constraint, i.e.: 

(11) U\ = D t - {u\ + U 2 ) . 

That is, DADP returns three strategies, for each of the hydraulic units and for 
the thermal unit. However, we use relation (11) for the thermal strategy during 
simulations in order to ensure demand satisfaction. 

Figure 2 shows that the algorithm behaves well, in the sense that the value of the 
objective function converges quite quickly to a neighbourhood of the optimal value. 
However, even after 100 iterations the curve is still a bit noisy. This is because 
the price dynamics employed generates a non-convex-set of stochastic processes. 
Consequently, the least squares problem solved at each iteration is non-linear, and 
hence small variations in the actual gradient can result in large changes in the 
calculated gradient. 

The key to convergence in DADP is to obtain a dynamics for the Lagrange 
multipliers that accurately matches the optimal one. We have represented in Figure 
3 the dynamics of the multipliers computed by DADP after 10, 20, 50 and 90 



3 In this example, we consider two hydraulic plants with characteristics: 
x 1 =0, x 1 = 50, u 1 = 6, K 1 (x) = -7x, 
x 2 =0, x 2 = 40, u 2 = 6, K 2 (x) = -12x, e = 0.1 
where y (resp. y) denotes a lower (resp. upper) bound for variable y. Moreover, producing u with 
the thermal plant costs Lt (u) = u + u . 
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FIGURE 2. Value of the dual function (blue), of the primal function 
(red) and optimum (black) along with iterations. 



iterations, and those derived from DP. We observe that the approximate price 
dynamics issued from DADP satisfactorily converges to the optimal one. This 





Figure 3. Comparison, using 100 samples, of the approximate 
prices trajectories (red) with the optimal ones (black), after 10, 20, 
50, and 90 iterations of the algorithm (from left to right and top 
to bottom). 
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indicates that: 

(1) the dual process converged, although the set of stochastic processes defined 
by (10) is non-convex, 

(2) there is no need to enhance the chosen dynamics (10) for the multipliers in 
this particular problem. 

Note that the optimal prices derived from DP are obtained by numerically dif- 
ferentating the Bellman functions, hence the numerical instabilities we observe in 
the lower parts of the DP prices curves. 

Remark 6 (Numerical complexity). We chose to validate the method on a two- 
dimensional power management problem, so we could compute a reference solu- 
tion using DP. Note that there would be no additional difficulty in implementing 
the same algorithm with a larger number of hydraulic units, i.e. with a larger- 
dimensional state. The complexity of DADP grows linearly with the number of 
subsystems. The most time consuming part of the algorithm is solving the sub- 
problems. However, this calculations can be easily parallelized on a computer, so 
that the time needed at each DADP iteration remains constant with respect to the 
number of subsystems. 

7. Conclusion 

In this paper we present an approach, called Dual Approximate Dynamic Pro- 
gramming (DADP), to solve large-scale stochastic optimal control problems in a 
price decomposition framework, without discretizing randomness. This method em- 
ploys classical duality theory results to solve decomposable large-scale systems, as it 
usually is in the deterministic framework. In order to be able to solve subproblems 
using DP, we suppose that the Lagrange multipliers obey some parameterized dy- 
namics. The DADP algorithm then iterates on the parameters of these dynamics. 
What is original in this approach is the use of a dual variable in the optimal local 
feedback functions as an auxiliary variable that sums up the remaining part of the 
system. 

On an example, we show that this approach is very attractive from a numerical 
point of view. Using rather simple dynamics for the multipliers, we obtained sur- 
prisingly good results with a small number of iterations. The main advantage of 
the method is that the complexity of the algorithm grows linearly with respect to 
the number of subsystems so that the curse of dimensionality is circumvented for 
the considered class of problems. 

There are still several important theoretical questions. Since we constrain the 
dual variables to lie in some a priori chosen subset, we cannot state that the cou- 
pling constraints will be satisfied. Hence it would be useful to be able to evaluate 
the distance between the solution given by the heuristic and the feasible set; this 
would also give clues on how to choose well-suited dynamics for the dual variables 
on particular problems. Furthermore, the stochastic process subset on which we 
constrain the dual variables is possibly non-convex. In this context, it might be 
valuable to use more enhanced numerical methods for the update of the Lagrange 
multipliers. Further studies will be concerned with these issues. 
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